-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedulers,test: avoid some test branches not being reached and remove schedulePeerPr #8087
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@@ -527,25 +534,12 @@ func checkHotWriteRegionScheduleByteRateOnly(re *require.Assertions, enablePlace | |||
clearPendingInfluence(hb.(*hotScheduler)) | |||
tc.SetHotRegionScheduleLimit(int(opt.GetHotRegionScheduleLimit())) | |||
|
|||
for i := 0; i < 20; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is duplicated with L500-L522. But this branch, which tests the transfer leader, is not reached, so it is not updated before.
pdServerCfg := tc.GetPDServerConfig() | ||
pdServerCfg.FlowRoundByDigit = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add config to avoid test branches not being reached
re.Equal("move-hot-write-leader", op.Desc()) | ||
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 10) | ||
re.Equal("move-hot-write-peer", op.Desc()) | ||
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be scheduled to store 10 or store 11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make the result deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8087 +/- ##
==========================================
- Coverage 77.21% 77.12% -0.10%
==========================================
Files 470 470
Lines 61671 61681 +10
==========================================
- Hits 47622 47574 -48
- Misses 10468 10516 +48
- Partials 3581 3591 +10
Flags with carried forward coverage won't be shown. Click here to find out more. |
// schedulePeerPr the probability of schedule the hot peer. | ||
schedulePeerPr = 0.66 | ||
schedulePeerPr = defaultSchedulePeerPr | ||
// pendingAmpFactor will amplify the impact of pending influence, making scheduling slower or even serial when two stores are close together | ||
pendingAmpFactor = 2.0 | ||
pendingAmpFactor = defaultPendingAmpFactor | ||
// If the distribution of a dimension is below the corresponding stddev threshold, then scheduling will no longer be based on this dimension, | ||
// as it implies that this dimension is sufficiently uniform. | ||
stddevThreshold = 0.1 | ||
stddevThreshold = defaultStddevThreshold | ||
// topnPosition is the position of the topn peer in the hot peer list. | ||
// We use it to judge whether to schedule the hot peer in some cases. | ||
topnPosition = 10 | ||
topnPosition = defaultTopnPosition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables are too complicated. Moreover, the current hot tests are hard to understand and debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will try to reduce these variables and to make test clear in later prs.
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
ops, _ = hs.Schedule(tc, false) | ||
re.Len(ops, 1) | ||
operatorutil.CheckTransferPeer(re, ops[0], operator.OpHotRegion, 1, 4) | ||
for i := 0; i < 100; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd/pkg/schedule/schedulers/hot_region.go
Lines 409 to 430 in e7c9d15
func (h *hotScheduler) balanceHotWriteRegions(cluster sche.SchedulerCluster) []*operator.Operator { | |
// prefer to balance by peer | |
s := h.r.Intn(100) | |
switch { | |
case s < int(schedulePeerPr*100): | |
peerSolver := newBalanceSolver(h, cluster, utils.Write, movePeer) | |
ops := peerSolver.solve() | |
if len(ops) > 0 && peerSolver.tryAddPendingInfluence() { | |
return ops | |
} | |
default: | |
} | |
leaderSolver := newBalanceSolver(h, cluster, utils.Write, transferLeader) | |
ops := leaderSolver.solve() | |
if len(ops) > 0 && leaderSolver.tryAddPendingInfluence() { | |
return ops | |
} | |
hotSchedulerSkipCounter.Inc() | |
return nil | |
} |
In master code, if a write peer was randomized but no create operator was created, another attempt was made to generate a write leader.
Now, to simplify the code, there is no such logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I added this check to avoid meeting panic when randomized write peer.
@@ -121,7 +122,7 @@ type baseHotScheduler struct { | |||
// this records regionID which have pending Operator by operation type. During filterHotPeers, the hot peers won't | |||
// be selected if its owner region is tracked in this attribute. | |||
regionPendings map[uint64]*pendingInfluence | |||
types []utils.RWType | |||
types []resourceType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a more clear name instead of resourceType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any good ideas about this name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can merge rwTy
and resourceType
in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can merge
rwTy
andresourceType
in another PR?
+1
return h.balanceHotReadRegions(cluster) | ||
case utils.Write: | ||
return h.balanceHotWriteRegions(cluster) | ||
case writePeer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to distinguish write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd/pkg/statistics/collector.go
Line 68 in e7c9d15
// while the Leader and Follower are under different loads (usually the Leader consumes more CPU). |
re.Equal("move-hot-write-leader", op.Desc()) | ||
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 10) | ||
re.Equal("move-hot-write-peer", op.Desc()) | ||
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make the result deterministic?
Signed-off-by: lhy1024 <admin@liudos.us>
/test pull-integration-realcluster-test |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HuSharp, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: Close #8073
What is changed and how does it work?
schedulePeerPr always be 1.0, which makes some test cannot reach transfer-leader branch
Check List
Tests
Release note